MSCA 32018 - NLP Summer 2022

FINAL PROJECT (Google Colab)

ILLINOIS POPULATION NEWS ANALYSIS

Author: Akhir Syabani
UCID: 12281056

Below is the final version in chronological steps (not including trials of other methods)

1. Load the original dataset

2. Filtering and Clean-Up: Select Only Relevant Titles

3. Filtering and Clean-Up: Split & Explode Text into Sentences

4. Checking Data Distribution

5. Filtering & Clean-Up: Remove Duplicates

6. Sentiment Analysis: Vader Lexicon

Optimized for social media data: https://www.analyticsvidhya.com/blog/2021/01/sentiment-analysis-vader-or-textblob/

7. Filtering & Clean-Up: Remove Errors/Outliers (Extremely Long, Misposted Titles)

8. Filtering & Clean-Up: Remove Punctuations (Additional Column)

9. Visualize Word Cloud per Sentiment

Word Cloud for Positive Sentiments

Word Cloud for Negative Sentiments

Word Cloud for Neutral Sentiments

10. Finding Major Keywords per Sentiment for Cross-Checking

Keywords to check:

Positive:

  1. Growth
  2. Growing
  3. Grow/grew/grows
  4. Increase

Negative:

  1. Lost
  2. Loss
  3. Losing
  4. Decline

Neutral:

  1. COVID
  2. undercounted

Need to examine/observe each particular keyword on its own sentiment group vs. other sentiments and make necessary (manual) adjustments.

'Growing' is almost a certain positive word. Except when it is paired with negative connotations (slowest growing, growing sense of unease). Some of them incorrectly tagged as neutral while supposed to be negative. Modify accordingly below.

'Grow/grows/grew' almost all positive but needs to be corrected when it is paired with negative words. Should be negative.

  1. Grow + less
  2. Unidentified number comparison: shrinking 121k vs. grew 74k

'Grow/grew' correctly classified as negative. Except when it is come from too many negative words within the sentence when it is supposed to be positive.

E.g. innacurate + screwed + actually grew

Most of the 'grow/grows/grew' in neutral need to be corrected to be positive. Few or them are actually negative.

11. Re-visualize Word Cloud after Sentiment Adjustments

Positive Sentiments

12. Text Observation to Understand the Context and Extract Insights

Positive Sentiments

Re-visualizing & Observe Negative Sentiments

Re-visualizing & Observe Neutral Sentiments

12. Topic Detection with ktrain

Topic Detection for Non-Filtered

Topic Detection for Select Titles

Topic Detection on Final List (Fully Filtered)

13. Entity Recognition

Entity Recognition from All Texts

14. Visualizing Sentiment Over Time: by Proportion (%)

14. Visualizing Sentiment Over Time: by Count

15. Visualizing World Cloud of Entities based on All Texts

Organizations

Persons

NORP (Nationalities, or Religious/Political Groups

LOC (Location)

GPE (Geo Political Entities)

FAC (Facility/Building)

16. Visualizing World Cloud of Entities based on Sentiments

Positive Sentiments

Negative Sentiments

Text Observation to Understand Context and Extract Insights

End of Notebook. Please refer to the presentation for complete interpretations/analysis/insights.